Sparse Discriminative Information Preservation for Chinese character font categorization

نویسندگان

Dapeng Tao

Lianwen Jin

Shuye Zhang

Zhao Yang

Yongfei Wang

چکیده

With the rapid development of optical character recognition (OCR), font categorization becomes more and more important. This is because font information has very wide usage and researchers came to know this point recently. In this paper, we propose a new scheme for Chinese character font categorization (CCFC), which applies LBP descriptor based Chinese character interesting points for representing font information. Specifically, it classifies Chinese character font through the cooperation between a new Sparse Discriminative Information Preservation (SDIP) for feature selection and NN classifier. SDIP focus three aspects as follows: (1) it preserves the local geometric structure of the intra-class samples and maximizes the margin between the inter-class samples on the local patch simultaneously; (2) it models the reconstruction error to preserve the prior information of the data distribution; and (3) it introduces the L1-norm penalty to achieve the sparsity of the projection matrix. We conduct experiments on our new collect text block images which include 25 popular Chinese fonts. The average recognition demonstrates the robustness and effectiveness of SDIP for CCFC. & 2013 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Prototype of Multi-Font Printed Chinese Character Reader

An approach to multi-font printed Chinese character recognition is proposed in this paper. The problems of inputting image of characters, preprocessing, character segmentati~n~feature extraction as well as character classification have been discussed. According to the characteristics of multi-font printed Chinese characters,the number of cutting across strokes, the external and internal areas w...

متن کامل

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

We’ve been developing a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kai, He, Yuan, LiShu, WeiBei, Xin...

متن کامل

Font Recognition of Chinese Character Based on Multi-Scale Wavelet

Optical character recognition system research has been acquired howling success, but the reconstruction of layout needs fonts of the characters. In this paper, a novel font recognition algorithm is proposed, which is based on multi-scale wavelet analysis. We adopt wavelet analysis and the grid method to deal with the character image, and extract wavelet energy density feature, and apply the BP ...

متن کامل

Effect of Pixel’s Spatial Characteristics on Recognition of Isolated Pixelized Chinese Character

The influence of pixel's spatial characteristics on recognition of isolated Chinese character was investigated using simulated prosthestic vision. The accuracy of Chinese character recognition with 4 kinds of pixel number (6*6, 8*8, 10*10, and 12*12 pixel array) and 3 kinds of pixel shape (Square, Dot and Gaussian) and different pixel spacing were tested through head-mounted display (HMD). A ca...

متن کامل

A Unified Semantic Embedding: Relating Taxonomies and Attributes

We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes. Contrary to prior work, which only utilized them as side information, we explicitly embed these semantic entities into the same space where we embed categories, which enables us to represent a category as their lin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Neurocomputing

دوره 129 شماره

صفحات -

تاریخ انتشار 2014

Sparse Discriminative Information Preservation for Chinese character font categorization

نویسندگان

چکیده

منابع مشابه

A Prototype of Multi-Font Printed Chinese Character Reader

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

Font Recognition of Chinese Character Based on Multi-Scale Wavelet

Effect of Pixel’s Spatial Characteristics on Recognition of Isolated Pixelized Chinese Character

A Unified Semantic Embedding: Relating Taxonomies and Attributes

عنوان ژورنال:

اشتراک گذاری